Data Source

The data here are a re-creation of the 2016 analysis from Vox, which Vox originally modeled off of the Washington State Department of Health’s analysis. They were derived from tract-level Census Bureau data for “poverty status in the last 12 months” (Table S1701) and “year structure built” (Table B25034).

How the Data Were Constructed

The final product in this dataset is a lead risk score ranging from 1-10, 1 indicating very little lead exposure risk and 10 indicating very high lead exposure risk. These lead risk scores take into account both the age of housing and poverty status, which have both been shown to influence the risk of lead poisoning in children. (There are other factors that influence lead exposure, but it is often hard to get good data on those things.)

First, I imported the poverty and housing data from the US Census Bureau for all 50 states. Then I removed all census tracts that had missing observations for number of households or number of individuals for whom poverty status is determined.

Then I created five categories for housing risk:

The numbers that each category is multiplied by indicate the percentage of households in that age range that contain lead paint, as established by Jacobs et al (2002).

I then calculated the housing risk for each census tract by adding together the five age categories and dividing that by the total number of households.

To calculate poverty risk, I divided the number of people under 125% of the poverty line by the total number of people.

Then I standardized the housing and poverty risks by calculating z-scores for them. After standardizing them, I created a weighted lead risk, with housing weighted at 0.58, and poverty weighted at 0.42. Then I added the weighted lead risks to create a combined lead risk. Those combined risks were then split up into deciles to create the lead risk scores from 1-10.

All of this was done at the national level first and then restricted to the Charlottesville area, so all lead risk scores are relative to the rest of the country.

Variable Descriptions

glimpse(leadrisk)
## Rows: 50
## Columns: 19
## $ GEOID                 <dbl> 51065020200, 51003010201, 51003010202, 510030104…
## $ NAME                  <chr> "Census Tract 202, Fluvanna County, Virginia", "…
## $ age_39                <dbl> 374.00, 47.60, 87.72, 234.60, 27.20, 17.00, 7.48…
## $ age40_59              <dbl> 132.44, 8.60, 14.19, 102.34, 6.45, 8.17, 3.01, 3…
## $ age60_79              <dbl> 36.24, 36.56, 36.64, 39.52, 30.88, 72.72, 11.60,…
## $ age80_99              <dbl> 18.72, 28.08, 15.39, 17.76, 19.77, 41.82, 26.31,…
## $ age00_plus            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total                 <dbl> 2297, 1995, 1359, 2137, 1512, 2812, 2000, 4186, …
## $ sum_housing_risk      <dbl> 561.40, 120.84, 153.94, 394.22, 84.30, 139.71, 4…
## $ housing_risk          <dbl> 24.440575, 6.057143, 11.327447, 18.447356, 5.575…
## $ total_poverty_status  <dbl> 4804, 4790, 3093, 4167, 3000, 6095, 4297, 7017, …
## $ total_under_125pct    <dbl> 439, 114, 149, 388, 103, 1176, 452, 901, 1166, 8…
## $ poverty_risk          <dbl> 9.138218, 2.379958, 4.817329, 9.311255, 3.433333…
## $ z_housing_risk        <dbl> 0.2870704, -0.9607426, -0.6030100, -0.1197315, -…
## $ z_poverty_risk        <dbl> -0.734216628, -1.228102854, -1.049982438, -0.721…
## $ weighted_housing_risk <dbl> 0.16650086, -0.55723072, -0.34974578, -0.0694442…
## $ weighted_poverty_risk <dbl> -0.308370984, -0.515803199, -0.440992624, -0.303…
## $ leadriskscore_raw     <dbl> -0.14187013, -1.07303392, -0.79073840, -0.372504…
## $ lead_risk_rank        <dbl> 5, 1, 2, 4, 1, 3, 1, 2, 2, 3, 5, 5, 8, 10, 6, 5,…

Summaries

leadrisk %>% select(-c(GEOID:NAME)) %>% 
  select(where(~is.numeric(.x) && !is.na(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 0,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## =======================================================
## Statistic             Mean  St. Dev. Min Median   Max  
## -------------------------------------------------------
## age_39                 131    123     0   89.1    496  
## age40_59               89      80     0    69     409  
## age60_79               40      24     3    35     139  
## age80_99               23      16     2    21      78  
## age00_plus              0      0      0     0      0   
## total                 2,260   984    148  2,128  4,626 
## sum_housing_risk       283    171     4    253    728  
## housing_risk           13      8      2    11      33  
## total_poverty_status  4,785  2,089   287 4,731.5 10,421
## total_under_125pct     731    558    103   637   3,289 
## poverty_risk           18      17     2    13      95  
## z_housing_risk         -0      1     -1    -1      1   
## z_poverty_risk         -0      1     -1    -0      6   
## weighted_housing_risk  -0      0     -1    -0      0   
## weighted_poverty_risk  -0      1     -1    -0      2   
## leadriskscore_raw      -0      1     -1    -0      2   
## lead_risk_rank          4      3      1     4      10  
## -------------------------------------------------------

Visual Distributions

leadrisk %>% 
  ggplot() +
  geom_histogram(aes(x=lead_risk_rank)) +
  scale_x_continuous("Lead risk rank", breaks=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), labels=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))

leadrisk %>% 
  ggplot() +
  geom_point(aes(x=housing_risk, y=poverty_risk)) +
  xlim(0, 100) +
  ylim(0, 100)

Spatial Distributions

cville_shapes <- readRDS("../cville_region_collection/data/cville_tracts.RDS")
leadrisk <- merge(cville_shapes, leadrisk, by = 'GEOID', all.x = TRUE)
leadrisk <- st_transform(leadrisk, crs = 4326)
pal <- colorFactor("viridis", reverse = TRUE, domain = leadrisk$lead_risk_rank)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(lead_risk_rank),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Lead Risk Rank: ", leadrisk$lead_risk_rank)) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$lead_risk_rank,
            title = "Lead Risk Rank", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$housing_risk)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(housing_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Housing Risk: ", round(leadrisk$housing_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$housing_risk,
            title = "Housing Risk", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$poverty_risk)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(poverty_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Poverty Risk: ", round(leadrisk$poverty_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$poverty_risk,
            title = "Poverty Risk", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$leadriskscore_raw)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(leadriskscore_raw),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Raw Lead Risk Score: ", round(leadrisk$leadriskscore_raw, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$leadriskscore_raw,
            title = "Raw Lead Risk Score", opacity = 0.7)